WIP: New string conversion API #951

jtv · 2025-02-16T22:00:33Z

This changes the string conversion API, but also introduces a mechanism for maintaining backward compatibility for existing string conversions, so as to minimise the trouble for users who implement their own.

It works like this: string conversion for a type is defined by its specialisation of pqxx::string_traits. This type contains a few conversion functions: to_buf(), into_buf(), from_string(). The signatures for these functions are changing, but in ways that we can reasonably "translate" both ways.

So these new generic functions outside of the traits types hide the difference. They expose only the new-style API, but they each can call either an old-style implementation or a new-style implementation (preferring the new-style one, of course). The use of function concepts for this avoids niggling incompatibilities due to optional extra parameters, or slight differences in types. If the call works, the API matches.

The new API reduces the use of raw pointers, in favour of std::span<char> buffers. It also introduces an optional std::source_location argument. And, in the near future, I hope to add some encapsulation of client encoding information so that we can parse some things that aren't safe to parse right now.

Replaces a lot of pointers with `std::span`, adds `source_location`. Keeping the new API backwards-compatible by checking at compile time which API each of a traits struct's conversion functions implement.

These versions of the functions live _outside_ the traits types. They help hide the changes in the string conversion API in libpqxx 7.

jtv · 2025-02-19T12:26:22Z

I'm working to improve memory safety and verifiability, and part of that is reducing the use of raw pointers. As it stands, the new conversion API still has one pointer in it — pqxx::string_traits<TYPE>::into_buf() returns a char *. I'd like to change that.

Question is: should the function return a view on the string it wrote into the buffer? Or should it return the span of remaining buffer space, keeping it convenient to append another value? It's not a completely trivial question because there's a terminating zero between the two. Returning the pointer is elegant in that it's the only piece of information that the caller wants and doesn't already have. I could also just return an index instead of a pointer, but I feel that would just complicate subsetting operations, and invite subtle bugs.

I may simply have to try each of the alternatives and compare them.

jtv · 2025-02-20T00:03:21Z

I'm working to improve memory safety and verifiability, and part of that is reducing the use of raw pointers. As it stands, the new conversion API still has one pointer in it — pqxx::string_traits<TYPE>::into_buf() returns a char *. I'd like to change that.

Question is: should the function return a view on the string it wrote into the buffer? Or should it return the span of remaining buffer space, keeping it convenient to append another value? It's not a completely trivial question because there's a terminating zero between the two. Returning the pointer is elegant in that it's the only piece of information that the caller wants and doesn't already have. I could also just return an index instead of a pointer, but I feel that would just complicate subsetting operations, and invite subtle bugs.

I may simply have to try each of the alternatives and compare them.

Actually, now that I've looked at the uses... returning an index into the buffer does look like the solution that best fits the callers' needs! And that means it's a simple solution, which in turn probably makes it less error-prone.

Probably lots more changes... It was a hard job.

This goes further than merely testing the values these functions return. It also checks for buffer overruns, memory overwrites, terminating zero in the right place, and so on.

The new string conversion API will include encoding information. We need that for array/composite parsing. I was thinking to hide this away in a class, but... it's starting to sound like too much overhead, both for the hardware and for humans. We've had this "encoding groups" system for years now, and haven't seen much need to change it. There are _some_ changes: I just added an `UNKNOWN` value to the enum so we can deal at least somewhat gracefully with cases where we don't have the information. And I'm looking forward in the 8.0 release to retiring a few of the encoding groups. Perhaps there are ones we'll want to add as well, but adding is cheap. Compilers and tooling will warn about unhandled cases in a `switch`, and that gives me more confidence that people won't be surprised by unexpected new groups. At least, nobody who cares about keeping their code correct, enables compiler warnings, and pays attention.

jtv added 10 commits February 16, 2025 20:56

Start sketching out 8.0 string conversion API.

e7158cb

Replaces a lot of pointers with `std::span`, adds `source_location`. Keeping the new API backwards-compatible by checking at compile time which API each of a traits struct's conversion functions implement.

Introduce generic to_buf()/into_buf().

a226d90

These versions of the functions live _outside_ the traits types. They help hide the changes in the string conversion API in libpqxx 7.

Format.

1a0ce30

Guard against bad types.

28228b1

Add source_location param to from_string().

493e662

Use span for esc_bin()/unesc_bin().

dcbf61b

Simplify separated_list() using if constexpr.

4950dca

Use std::format().

6966d53

Retire concat() and cat2().

471eae3

Convert some funcs to the new API.

fe213ee

jtv added the 8.0 label Feb 21, 2025

jtv added 17 commits February 23, 2025 23:35

Return std::size_t from into_buf().

fef27d2

Probably lots more changes... It was a hard job.

Update doc.

9bde7a6

Documentation.

249dcb8

Use apt-get in scripts, not apt.

aa1f627

Don't need debhelper.

4d69e19

Set nonteractive Debain frontend.

4d414df

Use UTC timezone as well.

679f0c5

Don't install cmake.

ce0fd9c

Typo.

79c03f5

Bunch of detailed to_buf tests.

77509ab

Use std::format().

877d756

More std::format().

0e49aa9

Test to_buf() for arrays, dates, and ranges.

1265729

Thoroughly test to_buf() & into_buf().

0890a79

This goes further than merely testing the values these functions return. It also checks for buffer overruns, memory overwrites, terminating zero in the right place, and so on.

In CircleCI, run apt-get ugprade.

f436ff8

-y.

659ec11

Trying to work around CircleCI OS problem.

5203283

jtv added 19 commits March 2, 2025 02:24

Trying to work around CircleCI OS problem.

1f6601c

What, no exim4?

0a855ae

Ah, exim4-base.

0d0947c

Nope.

6f8ea83

Retire comment.

4eb27e8

Notes.

5165abb

Represent source_location as text.

44f5f94

Try Debian unstable.

934a068

libtoolize was missing in CI.

122f1a1

Pass std::source_location in a few more places.

024f796

Introduce UNKNOWN encoding group.

a9f6848

Introduce conversion_context.

f8ffc76

More context. More encoding group.

8317c6f

Remove some unneeded specialisations.

ea12673

A public member function helps.

a27f9eb

Rename function, avoid name clash.

e653e42

Comment.

2463572

Check for unknown encoding in one more place.

239314b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: New string conversion API #951

WIP: New string conversion API #951

jtv commented Feb 16, 2025 •

edited

Loading

jtv commented Feb 19, 2025

jtv commented Feb 20, 2025

WIP: New string conversion API #951

Are you sure you want to change the base?

WIP: New string conversion API #951

Conversation

jtv commented Feb 16, 2025 • edited Loading

jtv commented Feb 19, 2025

jtv commented Feb 20, 2025

jtv commented Feb 16, 2025 •

edited

Loading